Overview

Dataset statistics

Number of variables9
Number of observations768
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory54.1 KiB
Average record size in memory72.2 B

Variable types

Numeric8
Categorical1

Warnings

Pregnancies is highly correlated with AgeHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Pregnancies is highly correlated with AgeHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Insulin is highly correlated with OutcomeHigh correlation
Outcome is highly correlated with InsulinHigh correlation
Pregnancies is highly correlated with AgeHigh correlation
BMI is highly correlated with SkinThicknessHigh correlation
Age is highly correlated with PregnanciesHigh correlation
SkinThickness is highly correlated with BMIHigh correlation
Pregnancies has 111 (14.5%) zeros Zeros
Insulin has 374 (48.7%) zeros Zeros

Reproduction

Analysis started2021-08-26 20:49:02.176462
Analysis finished2021-08-26 20:49:12.514679
Duration10.34 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

Pregnancies
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct17
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.845052083
Minimum0
Maximum17
Zeros111
Zeros (%)14.5%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-08-26T21:49:12.874168image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q36
95-th percentile10
Maximum17
Range17
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.369578063
Coefficient of variation (CV)0.8763413316
Kurtosis0.1592197775
Mean3.845052083
Median Absolute Deviation (MAD)2
Skewness0.9016739792
Sum2953
Variance11.35405632
MonotonicityNot monotonic
2021-08-26T21:49:13.021578image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
1135
17.6%
0111
14.5%
2103
13.4%
375
9.8%
468
8.9%
557
7.4%
650
 
6.5%
745
 
5.9%
838
 
4.9%
928
 
3.6%
Other values (7)58
7.6%
ValueCountFrequency (%)
0111
14.5%
1135
17.6%
2103
13.4%
375
9.8%
468
8.9%
557
7.4%
650
 
6.5%
745
 
5.9%
838
 
4.9%
928
 
3.6%
ValueCountFrequency (%)
171
 
0.1%
151
 
0.1%
142
 
0.3%
1310
 
1.3%
129
 
1.2%
1111
 
1.4%
1024
3.1%
928
3.6%
838
4.9%
745
5.9%

Glucose
Real number (ℝ≥0)

Distinct136
Distinct (%)17.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean120.8945312
Minimum0
Maximum199
Zeros5
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-08-26T21:49:13.211077image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile79
Q199
median117
Q3140.25
95-th percentile181
Maximum199
Range199
Interquartile range (IQR)41.25

Descriptive statistics

Standard deviation31.9726182
Coefficient of variation (CV)0.2644670347
Kurtosis0.6407798204
Mean120.8945312
Median Absolute Deviation (MAD)20
Skewness0.1737535018
Sum92847
Variance1022.248314
MonotonicityNot monotonic
2021-08-26T21:49:13.428260image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9917
 
2.2%
10017
 
2.2%
11114
 
1.8%
12914
 
1.8%
12514
 
1.8%
10614
 
1.8%
11213
 
1.7%
10813
 
1.7%
9513
 
1.7%
10513
 
1.7%
Other values (126)626
81.5%
ValueCountFrequency (%)
05
0.7%
441
 
0.1%
561
 
0.1%
572
 
0.3%
611
 
0.1%
621
 
0.1%
651
 
0.1%
671
 
0.1%
683
0.4%
714
0.5%
ValueCountFrequency (%)
1991
 
0.1%
1981
 
0.1%
1974
0.5%
1963
0.4%
1952
0.3%
1943
0.4%
1932
0.3%
1911
 
0.1%
1901
 
0.1%
1894
0.5%

BloodPressure
Real number (ℝ≥0)

Distinct46
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72.38671875
Minimum24
Maximum122
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-08-26T21:49:13.644029image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum24
5-th percentile52
Q164
median72
Q380
95-th percentile90
Maximum122
Range98
Interquartile range (IQR)16

Descriptive statistics

Standard deviation12.09664173
Coefficient of variation (CV)0.1671113423
Kurtosis1.098238954
Mean72.38671875
Median Absolute Deviation (MAD)8
Skewness0.1418850201
Sum55593
Variance146.3287412
MonotonicityNot monotonic
2021-08-26T21:49:13.936287image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
7279
 
10.3%
7057
 
7.4%
7452
 
6.8%
7845
 
5.9%
6845
 
5.9%
6443
 
5.6%
8040
 
5.2%
7639
 
5.1%
6037
 
4.8%
6234
 
4.4%
Other values (36)297
38.7%
ValueCountFrequency (%)
241
 
0.1%
302
 
0.3%
381
 
0.1%
401
 
0.1%
444
 
0.5%
462
 
0.3%
485
 
0.7%
5013
1.7%
5211
1.4%
5411
1.4%
ValueCountFrequency (%)
1221
 
0.1%
1141
 
0.1%
1103
0.4%
1082
0.3%
1063
0.4%
1042
0.3%
1021
 
0.1%
1003
0.4%
983
0.4%
964
0.5%

SkinThickness
Real number (ℝ≥0)

HIGH CORRELATION

Distinct51
Distinct (%)6.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41.81770833
Minimum7
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-08-26T21:49:14.333742image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile14.35
Q125
median35
Q372
95-th percentile72
Maximum99
Range92
Interquartile range (IQR)47

Descriptive statistics

Standard deviation21.44798988
Coefficient of variation (CV)0.5128925218
Kurtosis-1.298015273
Mean41.81770833
Median Absolute Deviation (MAD)13
Skewness0.4395765924
Sum32116
Variance460.0162701
MonotonicityNot monotonic
2021-08-26T21:49:14.501919image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
72227
29.6%
3231
 
4.0%
3027
 
3.5%
2723
 
3.0%
2322
 
2.9%
3320
 
2.6%
2820
 
2.6%
1820
 
2.6%
3119
 
2.5%
1918
 
2.3%
Other values (41)341
44.4%
ValueCountFrequency (%)
72
 
0.3%
82
 
0.3%
105
 
0.7%
116
0.8%
127
0.9%
1311
1.4%
146
0.8%
1514
1.8%
166
0.8%
1714
1.8%
ValueCountFrequency (%)
991
 
0.1%
72227
29.6%
631
 
0.1%
601
 
0.1%
561
 
0.1%
542
 
0.3%
522
 
0.3%
511
 
0.1%
503
 
0.4%
493
 
0.4%

Insulin
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct186
Distinct (%)24.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79.79947917
Minimum0
Maximum846
Zeros374
Zeros (%)48.7%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-08-26T21:49:14.666408image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median30.5
Q3127.25
95-th percentile293
Maximum846
Range846
Interquartile range (IQR)127.25

Descriptive statistics

Standard deviation115.2440024
Coefficient of variation (CV)1.444169856
Kurtosis7.214259554
Mean79.79947917
Median Absolute Deviation (MAD)30.5
Skewness2.272250858
Sum61286
Variance13281.18008
MonotonicityNot monotonic
2021-08-26T21:49:14.831093image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0374
48.7%
10511
 
1.4%
1309
 
1.2%
1409
 
1.2%
1208
 
1.0%
947
 
0.9%
1807
 
0.9%
1007
 
0.9%
1356
 
0.8%
1156
 
0.8%
Other values (176)324
42.2%
ValueCountFrequency (%)
0374
48.7%
141
 
0.1%
151
 
0.1%
161
 
0.1%
182
 
0.3%
221
 
0.1%
232
 
0.3%
251
 
0.1%
291
 
0.1%
321
 
0.1%
ValueCountFrequency (%)
8461
0.1%
7441
0.1%
6801
0.1%
6001
0.1%
5791
0.1%
5451
0.1%
5431
0.1%
5401
0.1%
5101
0.1%
4952
0.3%

BMI
Real number (ℝ≥0)

HIGH CORRELATION

Distinct248
Distinct (%)32.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.02889885
Minimum18.2
Maximum72.35402906
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-08-26T21:49:15.011122image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum18.2
5-th percentile22.235
Q127.5
median32.4
Q336.825
95-th percentile45.6
Maximum72.35402906
Range54.15402906
Interquartile range (IQR)9.325

Descriptive statistics

Standard deviation8.352770016
Coefficient of variation (CV)0.2528927789
Kurtosis5.821632748
Mean33.02889885
Median Absolute Deviation (MAD)4.8
Skewness1.694427767
Sum25366.19432
Variance69.76876695
MonotonicityNot monotonic
2021-08-26T21:49:15.187544image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3213
 
1.7%
31.612
 
1.6%
31.212
 
1.6%
72.3540290611
 
1.4%
32.410
 
1.3%
33.310
 
1.3%
30.19
 
1.2%
32.89
 
1.2%
32.99
 
1.2%
30.89
 
1.2%
Other values (238)664
86.5%
ValueCountFrequency (%)
18.23
0.4%
18.41
 
0.1%
19.11
 
0.1%
19.31
 
0.1%
19.41
 
0.1%
19.52
0.3%
19.63
0.4%
19.91
 
0.1%
201
 
0.1%
20.11
 
0.1%
ValueCountFrequency (%)
72.3540290611
1.4%
67.11
 
0.1%
59.41
 
0.1%
57.31
 
0.1%
551
 
0.1%
53.21
 
0.1%
52.91
 
0.1%
52.32
 
0.3%
501
 
0.1%
49.71
 
0.1%

DiabetesPedigreeFunction
Real number (ℝ≥0)

Distinct517
Distinct (%)67.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4718763021
Minimum0.078
Maximum2.42
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-08-26T21:49:15.390354image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0.078
5-th percentile0.14035
Q10.24375
median0.3725
Q30.62625
95-th percentile1.13285
Maximum2.42
Range2.342
Interquartile range (IQR)0.3825

Descriptive statistics

Standard deviation0.331328595
Coefficient of variation (CV)0.7021513764
Kurtosis5.594953528
Mean0.4718763021
Median Absolute Deviation (MAD)0.1675
Skewness1.919911066
Sum362.401
Variance0.1097786379
MonotonicityNot monotonic
2021-08-26T21:49:15.581679image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.2586
 
0.8%
0.2546
 
0.8%
0.2685
 
0.7%
0.2075
 
0.7%
0.2615
 
0.7%
0.2595
 
0.7%
0.2385
 
0.7%
0.194
 
0.5%
0.2634
 
0.5%
0.2994
 
0.5%
Other values (507)719
93.6%
ValueCountFrequency (%)
0.0781
0.1%
0.0841
0.1%
0.0852
0.3%
0.0882
0.3%
0.0891
0.1%
0.0921
0.1%
0.0961
0.1%
0.11
0.1%
0.1011
0.1%
0.1021
0.1%
ValueCountFrequency (%)
2.421
0.1%
2.3291
0.1%
2.2881
0.1%
2.1371
0.1%
1.8931
0.1%
1.7811
0.1%
1.7311
0.1%
1.6991
0.1%
1.6981
0.1%
1.61
0.1%

Age
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct52
Distinct (%)6.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.24088542
Minimum21
Maximum81
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2021-08-26T21:49:15.749888image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile21
Q124
median29
Q341
95-th percentile58
Maximum81
Range60
Interquartile range (IQR)17

Descriptive statistics

Standard deviation11.76023154
Coefficient of variation (CV)0.3537881556
Kurtosis0.6431588885
Mean33.24088542
Median Absolute Deviation (MAD)7
Skewness1.129596701
Sum25529
Variance138.3030459
MonotonicityNot monotonic
2021-08-26T21:49:15.929894image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2272
 
9.4%
2163
 
8.2%
2548
 
6.2%
2446
 
6.0%
2338
 
4.9%
2835
 
4.6%
2633
 
4.3%
2732
 
4.2%
2929
 
3.8%
3124
 
3.1%
Other values (42)348
45.3%
ValueCountFrequency (%)
2163
8.2%
2272
9.4%
2338
4.9%
2446
6.0%
2548
6.2%
2633
4.3%
2732
4.2%
2835
4.6%
2929
3.8%
3021
 
2.7%
ValueCountFrequency (%)
811
 
0.1%
721
 
0.1%
701
 
0.1%
692
0.3%
681
 
0.1%
673
0.4%
664
0.5%
653
0.4%
641
 
0.1%
634
0.5%

Outcome
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.1 KiB
0
500 
1
268 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters768
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row1

Common Values

ValueCountFrequency (%)
0500
65.1%
1268
34.9%

Length

2021-08-26T21:49:16.220647image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-26T21:49:16.303845image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
0500
65.1%
1268
34.9%

Most occurring characters

ValueCountFrequency (%)
0500
65.1%
1268
34.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number768
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0500
65.1%
1268
34.9%

Most occurring scripts

ValueCountFrequency (%)
Common768
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0500
65.1%
1268
34.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII768
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0500
65.1%
1268
34.9%

Interactions

2021-08-26T21:49:02.772548image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:02.924879image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:03.077310image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:03.235435image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:03.399276image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:03.546031image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:03.710110image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:03.925718image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:04.067502image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:04.215871image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:04.365034image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:04.517118image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:04.674376image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:04.991133image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:05.142947image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:05.290747image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:05.445037image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:05.582883image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:05.726684image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:05.858886image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:05.986092image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:06.113698image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:06.241697image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:06.375512image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:06.506340image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:06.640718image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:06.775049image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:06.904615image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:07.027068image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:07.161127image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:07.287551image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:07.420073image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:07.550000image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:07.682065image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:07.828335image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:07.955060image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:08.086610image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:08.209483image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:08.337128image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:08.464472image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:08.590698image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:08.728096image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:08.867577image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:09.001388image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:09.136213image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:09.265085image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:09.398520image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:09.535033image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:09.668289image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:09.810886image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:09.960574image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:10.095767image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:10.228493image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:10.359758image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:10.499237image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:10.639891image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:10.774209image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:10.921591image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:11.070146image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:11.206836image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:11.340832image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:11.470910image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:11.607015image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-08-26T21:49:11.743732image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-08-26T21:49:16.385116image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-08-26T21:49:16.607279image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-08-26T21:49:16.808242image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-08-26T21:49:17.007524image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-08-26T21:49:12.108164image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-08-26T21:49:12.397383image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.6000000.627501
11856629026.6000000.351310
281836472023.3000000.672321
318966239428.1000000.167210
40137403516843.1000002.288331
551167472025.6000000.201300
637850328831.0000000.248261
7101157272035.3000000.134290
82197704554330.5000000.158531
981259672072.3540290.232541

Last rows

PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
75811067672037.50.197260
75961909272035.50.278661
76028858261628.40.766220
76191707431044.00.403431
7629896272022.50.142330
76310101764818032.90.171630
76421227027036.80.340270
7655121722311226.20.245300
76611266072030.10.349471
7671937031030.40.315230